-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid support via SQLAlchemy #4163
Conversation
I'll fix the unit tests. |
result = self.query(subquery_obj) | ||
dimensions = [c for c in result.df.columns if c not in metrics] | ||
top_groups = self._get_top_groups(result.df, dimensions) | ||
qry = qry.where(top_groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If where
is called multiple times, does SQLAlchemy goes for a logical AND
? Couldn't find the documentation for that method quickly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the answer, logical AND is applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference:
return a new select() construct with the given expression added to its WHERE clause, joined to the existing clause via AND, if any. (emphasis mine)
LGTM |
* Use druiddb * Remove auto formatting * Show prequeries * Fix subtle bug with lists * Move arguments to query object * Fix druid run_query
* Use druiddb * Remove auto formatting * Show prequeries * Fix subtle bug with lists * Move arguments to query object * Fix druid run_query
* Use druiddb * Remove auto formatting * Show prequeries * Fix subtle bug with lists * Move arguments to query object * Fix druid run_query
I recently created a module called
druiddb
(merged intopydruid
this week) that provides a SQLAlchemy dialect for Druid. This allows Superset to talk to Druid using its standard SQLAlchemy connector, instead of the custom one.One problem with this approach is that Druid does not support joins, and some queries (timeseries with limit) perform an inner join to get the top overall groups. In order to handle Druid correctly I added a new attribute to engine specs called
inner_joins
, defaulting to true.If this attribute is false, instead of building the inner join we run a "prequery", fetching the top groups similar to how the Druid connector works. The values are then used as an extra filter in the main query. Eg, this is how a query with join works (from the
birth_names
dataset):And here how it looks when inner joins are not supported:
Both queries are shown when clicking "View Query", instead of only the last one. See the screenshot:
In order to do that, I added two new arguments to the query object:
prequeries
is a list that stores prequeries;is_prequery
is a boolean indicating if a given query is the final one, or a prequery.When a main query runs a prequery, it will append it to
prequeries
. The functionsquery
andget_query_string_response
then take care of combining prequeries with the main query, so it can be displayed to the user correctly.